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Abstract 


The semantics of concurrent data structures is usually given by a sequential specification and a 
consistency condition. Linearizability is the most popular consistency condition due to its sim¬ 
plicity and general applicability. Nevertheless, for applications that do not require all guarantees 
offered by linearizability, recent research has focused on improving performance and scalability 
of concurrent data structures by relaxing their semantics. 

In this paper, we present local linearizability, a relaxed consistency condition that is applicable 
to container-type concurrent data structures like pools, queues, and stacks. While linearizability 
requires that the effect of each operation is observed by all threads at the same time, local 
linearizability only requires that for each thread T, the effects of its local insertion operations and 
the effects of those removal operations that remove values inserted by T are observed by all threads 
at the same time. We investigate theoretical and practical properties of local linearizability and 
its relationship to many existing consistency conditions. We present a generic implementation 
method for locally linearizable data structures that uses existing linearizable data structures as 
building blocks. Our implementations show performance and scalability improvements over the 
original building blocks and outperform the fastest existing container-type implementations. 

1998 ACM Subject Classification D.3.1 [Programming Languages]: Formal Definitions and 
Theory—Semantics; E.l [Data Structures]: Lists, stacks, and queues; D.1.3 [Software]: Pro¬ 
gramming Techniques—Concurrent Programming 

Keywords and phrases (concurrent) data structures, relaxed semantics, linearizability 

[Y] Introduction 

Concurrent data structures are pervasive all along the software stack, from operating system 
code to application software and beyond. Both correctness and performance are imperative 
for concurrent data structure implementations. Correctness is usually specified by relat¬ 
ing concurrent executions, admitted by the implementation, with sequential executions, 
admitted by the sequential version of the data structure. The latter form the sequential 
specifieation of the data structure. This relationship is formally captured by consisteney 
eonditions, such as linearizability, sequential consistency, or quiescent consistency [25] . 

Linearizability |25j is the most accepted consistency condition for concurrent data struc¬ 
tures due to its simplicity and general applicability. It guarantees that the effects of all 
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operations by all threads are observed consistently. This global visibility requirement im¬ 
poses the need of extensive synchronization among threads which may in turn jeopardize 
performance and scalability. In order to enhance performance and scalability of implement¬ 
ations, recent research has explored relaxed sequential specifications [isiiiniia, resulting in 
well-performing implementations of concurrent data structures EllISlIMlllSlIMlE]. Except 
for |57j, the space of alternative consistency conditions that relax linearizability has been 
left unexplored to a large extent. In this paper, we explore (part of) this gap by invest¬ 
igating local linearizahility, a novel consistency condition that is applicable to a large class 
of concurrent data structures that we call container-type data structures, or containers for 
short. Containers include pools, queues, and stacks. A fine-grained spectrum of consist¬ 
ency conditions enables us to describe the semantics of concurrent implementations more 
precisely, e.g., we show in our appendix that work stealing queues 125] which could only be 
proven to be linearizable wrt pool are actually locally linearizable wrt double-ended queue. 

Local linearizability is a (thread-)local consistency 
condition that guarantees that insertions per thread are 
observed consistently. While linearizability requires 
a consistent view over all insertions, we only require 
that projections of the global history—so called thread- 
induced histories —are linearizable. The induced his¬ 
tory of a thread T is a projection of a program exe¬ 
cution to the insert-operations in T combined with all 
remove-operations that remove values inserted by T ir¬ 
respective of whether they happen in T or not. Then, 
the program execution is locally linearizable iff each thread-induced history is linearizable. 
Consider the example (sequential) history depicted in Figure It is not linearizable wrt a 
queue since the values are not dequeued in the same order as they were enqueued. However, 
each thread-induced history is linearizable wrt a queue and, therefore, the overall execution 
is locally linearizable wrt a queue. In contrast to semantic relaxations based on relaxing 
sequential semantics such as [22112], local linearizability coincides with sequential correct¬ 
ness for single-threaded histories, i.e., a single-threaded and, therefore, sequential history is 
locally linearizable wrt a given sequential specification if and only if it is admitted by the 
sequential specification. 

Local linearizability is to linearizability what coherence is to sequential consistency. Co¬ 
herence |22j . which is almost universally accepted as the absolute minimum that a shared 
memory system should satisfy, is the requirement that there exists a unique global order per 
shared memory location. Thus, while all accesses by all threads to a given memory location 
have to conform to a unique order, consistent with program order, the relative ordering of 
accesses to multiple memory locations do not have to be the same. In other words, coherence 
is sequential consistency per memory location. Similarly, local linearizability is linearizab¬ 
ility per local history. In our view, local linearizability offers enough consistency for the 
correctness of many applications as it is the local view of the client that often matters. For 
example, in a locally linearizable queue each client (thread) has the impression of using a 
perfect queue—no reordering will ever be observed among the values inserted by a single 
thread. Such guarantees suffice for many e-commerce and cloud applications. Implement¬ 
ations of locally linearizable data structures have been successfully applied for managing 
free lists in the design of the fast and scalable memory allocator scalloc |5]. Moreover, ex¬ 
cept for fairness, locally linearizable queues guarantee all properties required from Dispatch 
Queues |T], a common concurrency programming mechanism on mobile devices. 
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thread-induced history of thread T2 is 
enclosed by a solid line. 
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In this paper, we study theoretical and practical properties of local linearizability. Local 
linearizability is compositional—a history over multiple concurrent objects is locally linear- 
izable iff all per-object histories are locally linearizable (see Thm. 121 and locally linearizable 
container-type data structures, including queues and stacks, admit only “sane” behaviours— 
no duplicated values, no values returned from thin air, and no values lost (see Prop.|^. Local 
linearizability is a weakening of linearizability for a natural class of data structures including 
pools, queues, and stacks (see Sec. |^. We compare local linearizability to linearizability, 
sequential, and quiescent consistency, and to many shared-memory consistency conditions. 

Finally, local linearizability leads to new efficient implementations. We present a generic 
implementation scheme that, given a linearizable implementation of a sequential specific¬ 
ation S, produces an implementation that is locally linearizable wrt S (see Sec. [^. Our 
implementations show dramatic improvements in performance and scalability. In most cases 
the locally linearizable implementations scale almost linearly and even outperform state-of- 
the-art pool implementations. We produced locally linearizable variants of state-of-the-art 
concurrent queues and stacks, as well as of the relaxed data structures from [231 EH]. The 
latter are relaxed in two dimensions: they are locally linearizable (the consistency condi¬ 
tion is relaxed) and are out-of-order-relaxed (the sequential specification is relaxed). The 
speedup of the locally linearizable implementation to the fastest linearizable queue (LCRQ) 
and stack (TS Stack) implementation at 80 threads is 2.77 and 2.64, respectively. Verifica¬ 
tion of local linearizability, i.e. proving correctness, for each of our new locally linearizable 
implementations is immediate, given that the starting implementations are linearizable. 


[T] Semantics of Concurrent Objects 

The common approach to define the semantics of an implementation of a concurrent data 
structure is (1) to specify a set of valid sequential behaviors—the sequential specification, and 
(2) to relate the admissible concurrent executions to sequential executions specified by the 
sequential specification—via the consistency condition. That means that an implementation 
of a concurrent data structure actually corresponds to several sequential data structures, and 
vice versa, depending on the consistency condition used. A (sequential) data structure D is 
an object with a set of method calls S. We assume that method calls include parameters, 
i.e., input and output values from a given set of values. The sequential specification S of D 
is a prefix-closed subset of S*. The elements of S are called I?-valid sequences. For ease of 
presentation, we assume that each value in a data structure can be inserted and removed at 
most once. This is without loss of generality, as we may see the set of values as consisting 
of pairs of elements (core values) and version numbers, i.e. V = E x N. Note that this 
is a technical assumption that only makes the presentation and the proofs simpler, it is 
not needed and not done in locally linearizable implementations. While elements may be 
inserted and removed multiple times, the version numbers provide uniqueness of values. Our 
assumption ensures that whenever a sequence s is part of a sequential specification S, then, 
each method call in s appears exactly once. An additional core value, that is not an element, 
is empty. It is returned by remove method calls that do not find an element to return. We 
denote by Emp the set of values that are versions of empty, i.e., Emp = {empty} x N. 

► Definition 1 (Appears-before Order, Appears-in Relation). Given a sequence s G S* in which 
each method call appears exactly once, we denote by the total appears-hefore order over 
method calls in s. Given a method call m G S, we write m G s for m appears in s. o 

Throughout the paper, we will use pool, queue, and stack as typical examples of con¬ 
tainers. We specify their sequential specifications in an axiomatic way m, i.e., as sets of 
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(1) Vi, j € {1,, n}. s = mi ... m„ A rrii = nij => i = j 

(2) V® € V. r(x) € s => i(x) € s A i(a;) r(x) 

(3) Ve € Emp. Vx € V. i(x) -<s i'(e) => r(x) i'(e) 

(4) Vx,y € V. i(a;) i(y) A r(j/) G s => r(a;) G s A r(a;) r(j/) 

(5) Vx,y G V. i(x) i(2/) Vs r(x) r(i/) G s A r(i/) Vs r(x) 

H Table 1 The pool axioms (1), (2), (3); the queue order axiom (4); the stack order axiom (5) 


axioms that exactly define the valid sequences. 

► Definition 2 (Pool, Queue, &. Stack). A pool, queue, and stack with values in a set V 
have the sets of methods Sp = {ins(x),rem(a;) | x G V} U {rem(e) | e G Emp}, Eg = 
{enq(x), deq(a;) | x G tA} U {deq(e) | e G Emp}, and E 5 = {push(x), pop(x) | x G fA} U 
{pop(e) I e G Emp}, respectively. We denote the sequential specification of a pool by Sp, 
the sequential specification of a queue by Sq, and the sequential specification of a stack 
by Ss- A sequence s G Sp belongs to Sp iff it satisfies axioms (1) - (3) in Table [T]—t/ie 
pool axioms —when instantiating i() with ins() and r() with rem(). We keep axiom (1) 
for completeness, although it is subsumed by our assumption that each value is inserted 
and removed at most once. Specification Sq contains all sequences s that satisfy the pool 
axioms and axiom (4) —the queue order axiom —after instantiating i() with enq() and r() 
with deq(). Finally, Ss contains all sequences s that satisfy the pool axioms and axiom 
(5 )—the stack order axiom —after instantiating i() with push() and r() with pop(). o 

We represent concurrent executions via concurrent histories. An example history is 
shown in Figure Each thread executes a sequence of method calls from S; method 
calls executed by different threads may overlap (which does not happen in Figure [^. The 
real-time duration of method calls is irrelevant for the semantics of concurrent objects; 
all that matters is whether method calls overlap. Given this abstraction, a concurrent 
history is fully determined by a sequence of invocation and response events of method calls. 
We distinguish method invocation and response events by augmenting the alphabet. Let 
Ei = {mi I m G E} and E^ = | m G E} denote the sets of method-invocation events 

and method-response events, respectively, for the method calls in E. Moreover, let / be the 
set of thread identifiers. Let Ef = [m^ | m G E,fc G /} and E} = {mjl | m G E,fc G /} 
denote the sets of method-invocation and -response events augmented with identifiers of 
executing threads. For example, is the invocation of method call m by thread k. Before 
we proceed, we mention a standard notion that we will need in several occasions. 

► Definition 3 (Projection). Let s be a sequence over alphabet E and MCE. By s|M 

we denote the projection of s on the symbols in M, i.e., the sequence obtained from s by 
removing all symbols that are not in M. o 

► Definition 4 (History). A (concurrent) history h is a sequence in (Ef UE})* where (1) no 
invocation or response event appears more than once, i.e., if h = mi... m„ and mh = mj(x) 
and mj = m(.(x), for * G {*, r}, then h = j and k = I, and (2) if a response event m^ appears 
in h, then the corresponding invocation event m^ also appears in h and m^ -ih mr. ^ 

► Example 5. A queue history (left) and its formal representation as a sequence (right): 

enq(2} deq(l) 

Ti '"I-1.1- 

T 2 


enq(l) 


enq(2)}enq(l)fenq(2)}.deq(l)}enq(l)2deq(l)} 
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A history is sequential if every response event is immediately preceded by its match¬ 
ing invocation event and vice versa. Hence, we may ignore thread identifiers and identify 
a sequential history with a sequence in S*, e.g., enq(l)enq(2)deq(2)deq(l) identifies the 
sequential history in Figure 

A history h is well-formed if h|A: is sequential for every thread identifier k € I where h|fc 
denotes the projection of h on the set {mj | m € S} U {m^ | m G S} of events that are local 
to thread k. From now on we will use the term history for well-formed history. Also, we 
may omit thread identifiers if they are not essential in a discussion. 

A history h determines a partial order on its set of method calls, the precedence order: 

► Definition 6 (Appears-in Relation, Precedence Order). The set of method calls of a history h 

is M(h) = {m I mi G h}. A method call m appears in h, notation m G h, if m G M(h). 
The precedence order for h is the partial order <h such that, for m,n G h, we have that 
m <h n iff mr n-i. By we denote <h|fe, the subset of the precedence order that relates 
pairs of method calls of thread k, i.e., the program order of thread k. o 

We can characterize a sequential history as a history whose precedence order is total. In 
particular, the precedence order <s of a sequential history s coincides with its appears-before 
order The total order for history s in Fig. [^is enq(l) <s enq(2) <s deq(2) <s deq(l). 

► Definition 7 (Projection to a set of method calls). Let h be a history, MCE, M/ = | 

m G M, k G /}, and M/ = {mji | m G M, k G /}. Then, we write h|M for h|(M/ U M/). o 

Note that h|M inherits h’s precedence order: m <h|M n ^ m G M A n G M A m <h n 
A history h is complete if the response of every invocation event in h appears in h. Given 
a history h. Complete (h) denotes the set of all completions of h, i.e., the set of all complete 
histories that are obtained from h by appending missing response events and/or removing 
pending invocation events. Note that Complete (h) = {h} iff h is a complete history. 

A concurrent data structure D over a set of methods S is a (prefix-closed) set of concur¬ 
rent histories over S. A history may involve several concurrent objects. Let O be a set of 
concurrent objects with individual sets of method calls Eg and sequential specifications Sq 
for each object q G O. A history h over O is a history over the (disjoint) union of method 
calls of all objects in O, i.e., it has a set of method calls Ugeoi?-™ I ™ ^ added 

prefix q. ensures that the union is disjoint. The projection of h to an object q G O, denoted 
by hlg, is the history with a set of method calls E^ obtained by removing the prefix q. in 
every method call in h\{q.m \ m G S^}. 

► Definition 8 (Linearizability |26|). A history h is linearizable wrt the sequential specific¬ 

ation S if there is a sequential history s G S and a completion he G Complete (h) such 
that (1) s is a permutation of he, and (2) s preserves the precedence order of he, i.e., if 
ITT- <hc then m <s n. We refer to s as a linearization of h. A concurrent data structure 
D is linearizable wrt S if every history h of is linearizable wrt S. A history h over a set 
of concurrent objects O is linearizable wrt the sequential specifications Sq for q G O ii there 
exists a linearization s of h such that s|g G Sq for each object q G O. o 

fs] Local Linearizability 

Local linearizability is applicable to containers whose set of method calls is a disjoint union 
E = Ins U Rem U DOb U SOb of insertion method calls Ins, removal method calls Rem, data- 
observation method calls DOb, and (global) shape-observation method calls SOb. Insertions 
(removals) insert (remove) a single value in the data set V or empty; data observations return 
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a single value in V; shape observations return a value (not necessarily in V) that provides 
information on the shape of the state, for example, the size of a data structure. Examples 
of data observations are head(a;) (queue), top(a;) (stack), and peek(a;) (pool). Examples of 
shape observations are empty(6) that returns true if the data structure is empty and false 
otherwise, and size(n) that returns the number of elements in the data structure. 

Even though we refrain from formal definitions, we want to stress that a valid sequence 
of a container remains valid after deleting observer method calls: 

S I (ins U Rem) C S. (1) 

There are also containers with multiple insert/remove methods, e.g., a double-ended queue 
(deque) is a container with insert-left, insert-right, remove-left, and remove-right methods, 
to which local linearizability is also applicable. However, local linearizability requires that 
each method call is either an insertion, or a removal, or an observation. As a consequence, 
set is not a container according to our definition, as in a set ins(a:) acts as a global observer 

first, checking whether (some version of) x is already in the set, and if not inserts x. Also 

hash tables are not containers for a similar reason. 

Note that the arity of each method call in a container being one excludes data structures 
like snapshot objects. It is possible to deal with higher arities in a fairly natural way, 
however, at the cost of complicated presentation. We chose to present local linearizability 
on simple containers only. We present the definition of local linearizability without shape 
observations here and discuss shape observations in Appendix [A| 

► Definition 9 (In- and out-methods). Let h be a container history. For each thread T 
we define two subsets of the methods in h, called in-methods It and out-methods Ot of 
thread T, respectively: 

It = {m I m G M(h|r) n Ins} 

Ot = {m(a) G M(h) n Rem | ins(a) G It} U {m(e) G M(h) n Rem | e G Emp} 

U {m(a) G M(h) n DOb | ins(a) G It}. ^ 

Hence, the in-methods for thread T are all insertions performed by T. The out-methods 
are all removals and data observers that return values inserted by T. Removals that remove 
the value empty are also automatically added to the out-methods of T as any thread (and 
hence also T) could be the cause of “inserting” empty. This way, removals of empty serve as 
means for global synchronization. Without them each thread could perform all its operations 
locally without ever communicating with the other threads. Note that the out-methods Ot 
of thread T need not be performed by T, but they return values that are inserted by T. 

► Definition 10 (Thread-induced History). Let h be a history. The thread-induced history 
Ht is the projection of h to the in- and out-methods of thread T, i.e., Ht = h| (It U Ot). o 

► Definition 11 (Local Linearizability). A history h is locally linearizable wrt a sequential 

specification S if (1) each thread-induced history Ht is linearizable wrt S, and (2) the 
thread-induced histories Ht form a decomposition of h, i.e., m S h => m G Ht for some 
thread T. A data structure D is locally linearizable wrt S if every history h of H is locally 
linearizable wrt S. A history h over a set of concurrent objects O is locally linearizable wrt 
the sequential specifications Sq for g G O if each thread-induced history is linearizable over O 
and the thread-induced histories form a decomposition of h, i.e., q.m G h q.m G Ht for 
some thread T. o 

Local linearizability is sequentially correct, i.e., a single-threaded (necessarily sequential) 
history h is locally linearizable wrt a sequential specification S' iff h G 5. Like linearizabil- 
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ity |25j . local linearizability is compositional. The complete proof of the following theorem 
and missing or extended proofs of all following properties can be found in Appendix [B| 

► Theorem 12 (Compositionality). A history h over a set of objects O with sequential specific¬ 
ations Sq for q G O is locally linearizahle iffh\q is locally linearizahle wrt Sq for every q G O. 

Proof (Sketch). The property follows from the compositionality of linearizability and the 
fact that (hlg)^ = hxlq for every thread T and object q. ◄ 


The Choices Made. Splitting a global history into subhistories and requiring consistency 
for each of them is central to local linearizability. While this is common in shared-memory 
consistency conditions EH ED EH El Ea H HO], our study of local linearizability is a first 
step in exploring subhistory-based consistency conditions for concurrent objects. 

We chose thread-induced subhistories since thread-locality reduces contention in concur¬ 
rent objects and is known to lead to high performance as confirmed by our experiments. To 
assign method calls to thread-induced histories, we took a data-centric point of view by ( 1 ) 
associating data values to threads, and ( 2 ) gathering all method calls that insert/return a 
data value into the subhistory of the associated thread (Def. [^. We associate data values to 
the thread that inserts them. One can think of alternative approaches, for example, associ¬ 
ate with a thread the values that it removed. In our view, the advantages of our choice are 
clear: First, by assigning inserted values to threads, every value in the history is assigned 
to some thread. In contrast, in the alternative approach, it is not clear where to assign the 
values that are inserted but not removed. Second, assigning inserted values to the inserting 
thread enables eager removals and ensures progress in locally linearizable data structures. 
In the alternative approach, it seems like the semantics of removing empty should be local. 

An orthogonal issue is to assign values from shape observations to threads. In Ap¬ 
pendix]^ we discuss two meaningful approaches and show how local linearizability can be 
extended towards shape and data observations that appear in insertion operations of sets. 

Finally, we have to choose a consistency condition required for each of the subhistories. 
We chose linearizability as it is the best (strong) consistency condition for concurrent objects. 


Local Linearizability vs. Linearizability 


We now investigate the connection between local linearizability and linearizability. 

► Proposition 1 (Lin 1). In general, linearizability does not imply local linearizability. 


Proof. We provide an example of a data structure that is linearizable but not locally 
linearizable. Consider a sequential specification S'lMeariyQ which behaves like a queue ex¬ 
cept when the first two insertions were performed without a removal in between—then 
the first two elements are removed out of order. Formally, s G S'lMeariyQ iff (1) s = 
Sienq(a)enq( 6 )s 2 deq(&)s 3 deq(a)s 4 where Sienq(a)enq(5)s2deq(a)s3deq(6)s4 G 5'q and Si G 
{deq(e) I e G Emp}* for some a,b G V, or (2) s G Sq and s ^ Sienq(a)enq( 6 )s 2 for 
Si G {deq(e) | e G Emp}* and a, b G V. The example below is linearizable wrt S'NeariyQ- 
However, Ti’s induced history enq(l)enq(2)deq(l)deq(2) is not. 


enq(l) enq{2) d.eq(3) deq(2) 

Ti ■!-1- ■ ■-I -^---1-1.j-1 

enq(3) deq(l) 

I2 .1-1.1-1. 


◄ 


The following condition on a data structure specification is sufficient for linearizability 
to imply local linearizability and is satisfied, e.g., by pool, queue, and stack. 
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Ti 

T2 


^ r(empty) ^ r(2) 
i(2) r(l) 


Figure 2 LL, not SC (Pool, Queue, Stack) 


Ti 

T2 


^ r(l) ^ 

r(empty) 


Figure 3 SC, not LL (Pool, Queue, Stack) 


► Definition 13 (Closure under Data-Projection). A seq. specification S over E is closed 
under data-projectior^iS for all s C S' and all V' C V, s|{m(x) G S | a; G U Emp} G S. o 

For s = enq(l)enq(3)enq(2)deq(3)deq(l)deq(2) we have s G S|\ieariyQ, but s|{enq(a;), deq(a:) | 
X G {1, 2} U Emp} ^ SNeariyQ) i-G., SNeariyQ is not closed Under data-projection. 

► Proposition 2 (Lin 2). Linearizability implies local linearizability for sequential specifica¬ 
tions that are closed under data-projection. 


Proof (Sketch). The property follows from Definition 13 and Equation Q. 




There exist corner cases where local linearizability coincides with linearizability, e.g., for 
S = 0 or S = S*, or for single-producer/multiple-consumer histories. 

We now turn our attention to pool, queue, and stack. 

► Proposition 3. The seq. specifications Sp, Sq, and Ss are closed under data-projection. 


Proof (Sketch). Let s G Sp, V' C V, and let s' = s| ({ins(a;),rem(a;) | x G WUEmp}). 
Then, it suffices to check that all axioms for pool (Definitionand Tablehold for s'. ◄ 

► Theorem 14 (Pool & Queue & Stack, Lin). For pool, queue, and stack, local linearizability 
is (strictly) weaker than linearizability. 


Proof. Linearizability implies local linearizability for pool, queue, and stack as a con¬ 
sequence of Propositionand Proposition]^ The history in Figure]^ is locally linearizable 
but not linearizable wrt pool, queue and stack (after suitable renaming of method calls). ◄ 


Although local linearizability wrt a pool does not imply linearizability wrt a pool (The¬ 
orem 


14), it still guarantees several properties that ensure sane behavior as stated next. 


► Proposition 4 (LocLin Pool). Let h be a locally linearizable history wrt a pool. Then: 

1. No value is duplicated, i.e., every remove method appears in h at most once. 

2. No out-of-thin-air values, i.e., \/x G V. rem(a:) G h ^ ins(a;) G h A 

3. No value is lost, i.e., Vx G P. Ve G Emp. rem(e) <h rem(a:) ins(a;) fth rem(e) and 

Vx G P. Ve G Emp. ins(a;) <h rem(e) rem(a:) G h A rem(e)f(i^rem(a;). 


Proof. By direct unfolding of the definitions. 


Note that if a history h is linearizable wrt a pool, then all of the three stated properties 
hold, as a consequence of linearizability and the definition of Sp. 


Local Linearizability vs. Other Relaxed Consistency Conditions 

We compare local linearizability with other classical consistency conditions to better under¬ 
stand its guarantees and implications. 


^ The same notion has been used in |7] under the name closure under projection. 
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Sequential Consistency (SC). A history h is sequentially eonsistent [151 HO] wrt a se¬ 
quential specification S, if there exists a sequential history s € S and a completion he G 
Complete (h) such that (1) s is a permutation of he, and (2) s preserves each thread’s pro¬ 
gram order, i.e., if m n, for some thread T, then m <s n. We refer to s as a sequential 
witness of h. A data structure D is sequentially consistent wrt S if every history h of is 
sequentially consistent wrt S. 

Sequential consistency is a useful consistency condition for shared memory but it is not 
really suitable for data structures as it allows for behavior that excludes any coordination 
between threads m- an implementation of a data structure in which every thread uses a 
dedicated copy of a sequential data structure without any synchronization is sequentially 
consistent. A sequentially consistent queue might always return empty in one (consumer) 
thread as the point in time of the operation can be moved, e.g., see Figure [^ In a producer- 
consumer scenario such a queue might end up with some threads not doing any work. 


► Theorem 15 (Pool, Queue & Stack, SC). For pool, queue, and stack, local linearizability 
is incomparable to sequential consistency. ◄ 


Figures]^ andgive example histories that show the statement of Theorem ^ In contrast 
to local linearizability, sequential consistency is not compositional [25]. 


(Quantitative) Quiescent Consistency (QC & QQC). Like linearizability and sequential 
consistency, quiescent consistency dans] also requires the existence of a sequential history, 
a quiescent witness, that satisfies the sequential specification. All three consistency con¬ 
ditions impose an order on the method calls of a concurrent history that a witness has to 
preserve. Quiescent consistency uses the concept of quiescent states to relax the requirement 
of preserving the precedence order imposed by linearizability. A quiescent state is a point 
in a history at which there are no pending invocation events (all invoked method calls have 
already responded). In a quiescent witness, a method call m has to appear before a method 
call n if and only if there is a quiescent state between m and n. Method calls between 
two consecutive quiescent states can be ordered arbitrarily. Quantitative quiescent consist¬ 
ency m refines quiescent consistency by bounding the number of reorderings of operations 
between two quiescent states based on the concurrent behavior between these two states. 

The next result about quiescent consistency for pool is needed to establish the connection 
between quiescent consistency and local linearizability. 

► Proposition 5. A pool history h satisfying 1.-3. of Prop. [^ is quiescently consistent. ◄ 
From Prop. [^ and [^follows that local linearizability implies quiescent consistency for pool. 

► Theorem 16 (Pool, Queue & Stack, QC). For pool, local linearizability is (strictly) stronger 

than quiescent consistency. For queue and stack, local linearizability is incomparable to 
quiescent consistency. ◄ 

Local linearizability also does not imply the stronger condition of quantitative quies¬ 
cent consistency. Like local linearizability, quiescent consistency and quantitative quiescent 
consistency are compositional For details, please see Appendix jp] 

Consistency Conditions for Distributed Shared Memory. There is extensive research on 
consistency conditions for distributed shared memory [21 SI [HI dll 12011221 EOl Ell E2] • In 
Appendix [^ we compare local linearizability against coherence, PRAM consistency, pro¬ 
cessor consistency, causal consistency, and local consistency. All these conditions split a 
history into subhistories and require consistency of the subhistories. For our comparison. 
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ins(l) 

■■■! -^. 

ins(2) head(2) h6ad(l) head(2) head(l) 
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M Figure 4 Problematic shared-memory history. 


we first define a sequential specification Sm for a single memory location. We assume that 
each memory location is preinitialized with a value Umit € V. A read-operation returns 
the value of the last write-operation that was performed on the memory location or Vmit if 
there was no write-operation. We denote write-operations by ins and read-operations by 
head. Formally, we define Sm as Sm = {head(ti,„jt)}* • {ins(?;)head(z;)® | i > O,!! S V}*. 
Note that read-operations are data observations and the same value can be read multiple 
times. For brevity, we only consider histories that involve a single memory location. In the 
following, we summarize our comparison. For details, please see Appendix |E| 

While local linearizability is well-suited for concurrent data structures, this is not neces¬ 
sarily true for the mentioned shared-memory consistency conditions. On the other hand, 
local linearizability appears to be problematic for shared memory. Consider the locally 
linearizable history in Figure There, the read values oscillate between different values 
that were written by different threads. Therefore, local linearizability does not imply any 
of the shared-memory consistency conditions. In Appendix]^ we further show that local 
linearizability is incomparable to all considered shared-memory conditions. 


In this section, we focus on locally linearizable data structure implementations that are gen¬ 
eric as follows: Choose a linearizable implementation of a data structure <i> wrt a sequential 
specification 5$, and we turn it into a (distributed) data structure called LTD 4) that is 
locally linearizable wrt S'$. An LTD implementation takes several copies of 4) (that we call 
backends) and assigns to each thread T a backend Then, when thread T inserts an 
element into LTD <i>, the element is inserted into ‘hr, and when an arbitrary thread removes 
an element from LTD <i>, the element is removed from some eagerly, i.e., if no element is 
found in the attempted backend the search for an element continues through all other 
backends. If no element is found in one round through the backends, then we return empty. 

► Proposition 6 (LLD correctness). Let 4) be a data structure implementation that is linear¬ 
izable wrt a sequential specification S^. Then LLD <i> is locally linearizable wrt S^. 

Proof. Let h be a history of LLD 4). The crucial observation is that each thread-induced 
history hx is a backend history of <i>x and hence linearizable wrt 5"$. ◄ 

Any number of copies (backends) is allowed in this generic implementation of LLD <i>. 
If we take just one copy, we end up with a linearizable implementation. Also, any way of 
choosing a backend for removals is fine. However, both the number of backends and the 
backend selection strategy upon removals affect the performance significantly. In our LLD 
<i> implementations we use one backend per thread, resulting in no contention on insertions, 
and always attempt a local remove first. If this does not return an element, then we continue 
a search through all other backends starting from a randomly chosen backend. 

LLD 4) is an implementation closely related to Distributed Queues (DQs) [IH]. A DQ 
is a (linearizable) pool that is organized as a single segment of length ^ holding £ backends. 
DQs come in different flavours depending on how insert and remove methods are distributed 
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across the segment when accessing backends. No DQ variant in [T^ follows the LLD ap¬ 
proach described above. Moreover, while DQ algorithms are implemented for a fixed number 
of backends, LLD <i> implementations manage a segment of variable size, one backend per 
(active) thread. Note that the strategy of selecting backends in the LLD implement¬ 
ations is similar to other work in work stealing m- However, in contrast to this work 
our data structures neither duplicate nor lose elements. LLD (stack) implementations have 
been successfully applied for managing free lists in the fast and scalable memory allocator 
scalloc [S]. The guarantees provided by local linearizability are not needed for the correct¬ 
ness of scalloc, i.e., the free lists could also use a weak pool (pool without a linearizable 
emptiness check). However, the LLD stack implementations provide good caching behavior 
when threads operate on their local stacks whereas a weak pool would potentially negatively 
impact performance. 

We have implemented LLD variants of strict and relaxed queue and stack implementa¬ 
tions. None of our implementations involves observation methods, but the LLD algorithm 
can easily be extended to support observation methods. For details, please see App. |F.4| 
Finally, let us note that we have also experimented with other locally linearizable imple¬ 
mentations that lacked the genericity of the LLD implementations, and whose performance 
evaluation did not show promising results (see App. E- As shown in Sec. a locally lin¬ 
earizable pool is not a linearizable pool, i.e., it lacks a linearizable emptiness check. Indeed, 
LLD implementations do not provide a linearizable emptiness check, despite of eager re¬ 
moves. We provide LL+D d), a variant of LLD d), that provides a linearizable emptiness 
check under mild conditions on the starting implementation <i> (see App. F.l for details). 


Experimental Evaluation. All experiments ran on a uniform memory architecture (UMA) 
machine with four 10-core 2GHz Intel Xeon E7-4850 processors supporting two hardware 
threads (hyperthreads) per core, 128GB of main memory, and Linux kernel version 3.8.0. 
We also ran the experiments without hyper-threading resulting in no noticeable difference. 
The GPU governor has been disabled. All measurements were obtained from the artifact- 
evaluated Seal benchmarking framework [nmniiii], where you can also find the code of 
all involved data structures. Seal uses preallocated memory (without freeing it) to avoid 
memory management artifacts. For all measurements we report the arithmetic mean and 
the 95% confidence interval (sample size=10, corrected sample standard deviation). 

In our experiments, we consider the linearizable queues Michael-Scott queue (MS) [34] 
and LGRQ [3^ (improved version [S^), the linearizable stacks Treiber stack (Treiber) [33] 
and TS stack |T3| , the fc-out-of-order relaxed fc-FIFO queue |3S| and fc-Stack [33] and linear¬ 
izable well-performing pools based on distributed queues using random balancing |18j (1-RA 
DQ for queue, and 1-RA DS for stack). For each of these implementations (but the pools) 
we provide LLD variants (LLD LGRQ, LLD TS stack, LLD fc-FIFO, and LLD fc-Stack) and, 
when possible, LL+D variants (LL+D MS queue and LL+D Treiber stack). Making the 
pools locally linearizable is not promising as they are already distributed. Whenever LL’^D 
is achievable for a data structure implementation 4) we present only results for LL+D 4) as, 
in our workloads, LLD 4> and LL+D 4> implementations perform with no visible difference. 

We evaluate the data structures on a Seal producer-consumer benchmark where each 
producer and consumer is configured to execute 10® operations. To control contention, we 
add a busy wait of 5fj,s between operations. This is important as too high contention res¬ 
ults in measuring hardware or operating system (e.g., scheduling) artifacts. The number of 
threads ranges between 2 and 80 (number of hardware threads) half of which are producers 
and half consumers. To relate performance and scalability we report the number of data 
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“stack-like” data structures 

M Figure 5 Performance and scalability of producer-consumer microbenchmarks with an increasing 
number of threads on a 40-core (2 hyperthreads per core) machine 


structure operations per second. Data structures that require parameters to be set are con¬ 
figured to allow maximum parallelism for the producer-consumer workload with 80 threads. 
This results in fc = 80 for all fc-FIFO and /c-Stack variants (40 producers and 40 consumers 
in parallel on a single segment), p — 80 for 1-RA-DQ and 1-RA-DS (40 producers and 
40 consumers in parallel on different backends). The TS Stack algorithm also needs to be 
configured with a delay parameter. We use optimal delay (Jps) for the TS Stack and zero 
delay for the LLD TS Stack, as delays degrade the performance of the LLD implementation. 

Figure shows the results of the producer-consumer benchmarks. Similar to experi¬ 
ments performed elsewhere [H Ha nHi Eg the well-known algorithms MS and Treiber do 
not scale for 10 or more threads. The state-of-the-art linearizable queue and stack algorithms 
LCRQ and TS-interval Stack either perform competitively with their fc-out-of-order relaxed 
counter parts fc-FIFO and fc-Stack or even outperform and outscale them. For any imple¬ 
mentation 4), LLD 4) and LL+D 4) (when available) perform and scale significantly better 
than 4) does, even slightly better than the state-of-the-art pool that we compare to. The best 
improvement show LLD variants of MS queue and Treiber stack. The speedup of the locally 
linearizable implementation to the fastest linearizable queue (LCRQ) and stack (TS Stack) 
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implementation at 80 threads is 2.77 and 2.64, respectively. The performance degradation 
for LCRQ between 30 and 70 threads aligns with the performance of fetch-and-inc—the 
CPU instruction that atomically retrieves and modifies the contents of a memory location— 
on the benchmarking machine, which is different on the original benchmarking machine m- 
LCRQ uses fetch-and-inc as its key atomic instruction. 

[Y] Conclusion 8i Future Work 

Local linearizability splits a history into a set of thread-induced histories and requires con¬ 
sistency of all such. This yields an intuitive consistency condition for concurrent objects 
that enables new data structure implementations with superior performance and scalability. 
Local linearizability has desirable properties like compositionality and well-behavedness for 
container-type data structures. As future work, it is interesting to investigate the guarantees 
that local linearizability provides to client programs along the line of Ha- 
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A I Local Linearizability with Shape Observers 


There are two possible ways to deal with shape observers: treat them locally, in the thread- 
induced history of the performing thread, or treat them globally. While a local treatment 
is immediate and natural to a local consistency condition, a global treatment requires care. 
We present both solutions next. 


► Definition 17 (Local Linearizability LSO). A history h is locally linearizable with local 
shape observers (LSO) wrt a sequential specification S if it is locally linearizable according 
to Definition 11 with the difference that the in-methods (Definition|^ also contain all shape 
observers performed by thread T, i.e.. It = {m \ m G M(h|r) n (Ins U SOb)}. o 


Global observations require more notation and auxiliary notions. Let sj for j G J he 
a collection of sequences over alphabet S with pairwise disjoint sets of symbols M(sj). A 
sequence s is an interleaving of sj for j G J ii M{s) = (J^ Af(sj) and s|M(sj) = sj for all 
j G J■ We write sj for the set of all interleavings of sj with j G J. 

Given a history h and a method call m G h, we write h-"* for the (incomplete) history 
that is the prefix of h up to and without nir, the response event of m. Hence, h-"* contains 
all invocation and response events of h that appear before mr- 


► Definition 18. Let S denote the sequential specification of a container D. A shape 
observer m in a history h has a witness if there exists a sequence s G S* such that sm G S 
and s G for some st that is a linearization of the thread-induced history (h-’”)T. o 

Informally, the above definition states that a global shape observer m must be justified 
by a (global) witness. Such a global witness is a sequence that (1) when extended by m 
belongs to the sequential specification, and (2) is an interleaving of linearizations of the 
thread-induced histories up to m. 


► Definition 19 (Local Linearizability GSO). A history h is locally linearizable with global 

shape observers (GSO) wrt a sequential specification S if it is locally linearizable and each 
shape observer m G SOb has a witness. o 

We illustrate the difference in the local vs. the global approach for shape observers with 
the following example. 

► Example 20. Gonsider the following queue history with global observer size() 


enq(l) enq(2) 

Ti ■■■I-1.1- 1 .> 

deq(l) size(n) 

T 2 . 1 - 1 . 1 - 1 .> 

where n is just a placeholder for a concrete natural number. For n = 0, the history h is 
locally linearizable LSO, but not locally linearizable GSO. For n = 1, the history h is locally 
linearizable GSO, but not locally linearizable LSO. 

Global observers and non-disjoint operations are expected to have negative impact on 
performance. If one cares for global consistency, local linearizability is not the consistency 
condition to be used. The restriction to containers and disjoint operations specifies, in an 
informal way, the minimal requirements for local consistency to be acceptable. 

Neither sets nor maps are containers according to our definition. However, it is possible to 
extend our treatment to sets and maps similar to our treatment of global observers. Locally 
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linearizable sets and maps will be weaker than their linearizable counterparts, but, due to 
the tight coupling between mutator and observer effects, the gain in performance is unlikely 
to be as substantial as the one observed in other data structures. The technicalities needed 
to extend local linearizability to sets and maps would complicate the theoretical development 
without considerable benefits and we, therefore, excluded such data structures. 


B Additional Results and Proofs 


Theorem 


12 


(Compositionality). A history h over a set of objects O with sequential 
specifications Sq for q £ O is locally linearizable if and only ifh\q is locally linearizable with 
respect to Sq for every q G O. 


Proof. The property follows from the compositionality of linearizability and the fact that 
(hlg)j’ = hT’lq for every thread T and object q. Assume that h over O is locally lineariz¬ 
able. This means that all thread-induced histories hj’ over O are linearizable. Hence, since 
linearizability is compositional, for each object q G O the history h^lg is linearizable with 
respect to Sq. Now from (hjgjT’ = h-rj^ we have that for every object q the history (hjgjr 
is linearizable for every thread T. 

Similarly, assume that for every object q G O the history h |(7 is locally linearizable. Then, 
for every q, (hjgjT = hylg is linearizable for every thread T. From the compositionality of 
linearizability, \\t is linearizable for every thread T. This proves that h is locally linearizable. 

◄ 


Proposition (Lin vs. LocLin 2). Linearizability implies local linearizability for 
sequential specifications that are closed under data-projection. 

Proof. Assume we are given a history h which is linearizable with respect to a sequential 
specification S that is closed under data-projection. Further assume that, without loss of 
generality, h is complete. Then there exists a sequential history s G S such that (1) s is a 
permutation of h, and (2) if m <h n, then also m <s n. Given a thread T, consider the 
thread-induced history Ht’ and let s-p = s| (It’ U Ot). Then, Sj- is a permutation of since 
hx and Sx consist of the same events. Furthermore, Sx G S since S is closed under data- 
projection and since Equation 0 holds for containers. Finally, we have for each m G hy 
and n G hx that, if m <hj, n, then also m <sj, n since m <h n and therefore m <s n which 
implies m <sj, n. Thereby, we have shown that \\x is linearizable with respect to S, for an 
arbitrary thread T. Hence h is locally linearizable with respect to S. ◄ 

Proposition]^ (Data-Projection Closedness). The sequential specifications of pool, 
queue, and stack are closed under data-projection. 

Proof. Let s G Sp, V C V, and let 

s' = s| ({ins(x), rem(a;) \ x gV VJ Emp}) . 

Then, it suffices to check that all axioms for pool (Definition and Table hold for s'. 
Clearly, all methods in s' appear at most once, as they do so in s. If rem(x) G s', then 
rem(a:) G s and, since s G Sp, ins(a;) rem(a;). But then also rem(x) G s' and hence 
ins(a:) rem(a::). Finally, if ins(a;) ^g' rem(e) for e G Emp, then ins(x) ^g rem(e) implying 
that rem(a;) G s and rem(x) ^g rem(e). But then rem(a;) G s' as well and rem(a;) ^g' rem(e). 
This shows that Sp is closed under data-projection. 
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Assume now that s G Sq and s' is as before (with enq() and deq() for ins() and rem(), 
respectively). Then, as Sp is closed under data-projection, s' satisfies the pool axioms. 
Moreover, the queue-order axiom (Definition and Table Q also holds: Assume enq(a;) ^s' 
enq(y) and deq(y) G s'. Then enq(a;) enq(?/) and deq(y) G s. Since s G Sq we get 
deq(a:) G s and deq(a:) deq(?/). But this means deq(a:) G s' and deq(a:) deq{y). 
Hence, Sq is closed under data-projection. 

Finally, if s G .Ss and s' is as before (with push() and pop() for ins() and rem(), 
respectively), we need to check that the stack-order axiom (Definitionand Table[^ holds. 
Assume push(a;) ^s' push(y) ^g/ pop(a;). This implies push(a;) -<g push(?/) ^g pop(a;) 
and since s G we get pop(?/) G s and pop(y) ^g pop(a;). But then pop(y) G s' and 
pop(y) ^s' pop(a^)- So, Ss is closed under data-projection. ◄ 

Proposition (LocLin Pool). Let h be a loeally linearizahle history wrt a pool. Then: 

1. No value is duplicated, i.e., every remove method appears in h at most once. 

2. There are no out-of-thin-air values, i.e., 

Vx G V. rem(x) G h ins{x) G h A rem{x)it-^^ins{x). 

3. No value is lost, i.e., \/x G V. Ve G Emp. ins{x) <h rem{e) rem{x) G h A rem{e)it-^^rerri^x) 
and \/x G V.ye G Emp. rem{e) <h rem{x) ins{x) rem(e). 

Proof. Note that if a history h is linearizable wrt a pool, then all of the three stated 
properties hold, as a consequence of linearizability and the definition of Sp. Now assume 
that h is locally linearizable wrt a pool. 

If rem(a;) appears twice in h, then it also appears twice in some thread-induced history 
hr contradicting that is linearizable with respect to a pool. This shows that no value is 
duplicated. 

If rem(a;) G h, then rem(x) G hy for some T and, since is linearizable with respect to 
a pool, ins(a;) G and rem(a:)5zlijT^’^®(*)- This yields ins(x) G h and rem(a;)ylhi'^s(2^)- 
Hence, there are no thin-air values. 

Finally, if rem(e) G h for e G Emp then rem(e) G for all T. Let ins(x) <h rem(e) and 
let T' be such that ins(a::) G h-p' ■ Then ins(x) <hj,/ rem(e) and since hy' is linearizable 
with respect to a pool, rem(a:) G h^' and rem(e)ylhT/^®“(^)- This yields rem(x) G h and 
rem(e)yljjrem(a;). Similarly, the other condition holds. Hence, no value is lost. ◄ 


Theorem 25 (Queue Local Linearizability). A queue concurrent history h is locally 
linearizahle with respect to the queue sequential specification Sq if and only if 


1. h js locally linearizahle with respect to the pool sequential specification Sp, and 

2. yx,y G V. Mi. enq{x) <1^ snq{y) A deq{y) G h deq{x) G h A deq{y) ylh deq{x). 


Proof. Assume h is locally linearizable with respect to Sq. Since Sq C Sp (with suitably 
renamed method calls), h is locally linearizable with respect to Sp. Moreover, since all 
are linearizable with respect to Sq, by Theorem 24 for all i we have Va;,?/ G V. enq(a;) <hi 
enq( 2 /) A deq( 2 /) G =A deq(a:) G A deq( 2 /) deq(x). 

Assume x,y G V are such that enq(x) enq(?/) and deq( 2 /) G h. Then enq(a:) Ah, 
enq( 2 /) and deq(y) G so deq(x) G and deq{y) deq(a:). This implies deq(a:) G h 
and deq{y) fth deq(a;). 

For the opposite, assume that conditions 1. and 2. hold for a history h. We need to 
show that (1) hi form a decomposition of h, which is clear for a queue, and (2) each h^ is 
linearizable with respect to Sq. 
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By 1., each is linearizable with respect to a pool. Assume enq(a:) <hi enq(y) and 
deq(y) G h^. Then enq(a;) <1^ enq{y) A deq(y) G h and hence by 2., deq(a;) G h A 
deq(y) fth deq(a;). Again, as enq(a;), deq(a:) G we get deq(a;) G A deq(y) deq(a;). 

According to Theorem]^ this is enough to conclude that each is linearizable with respect 
to Sq. ◄ 

Theorem |15| (Pool, Queue, & Stack, SC). For pool, queue, and stack, local linearizab- 
ility is incomparable to sequential consistency. 

Proof. The following histories, when instantiating i() with ins(), enq(), and push(), re¬ 
spectively, and instantiating r() with rem(), deq(), and pop(), respectively, are sequentially 
consistent but not locally linearizable wrt pool, queue and stack: 

(a) Pool: 


(b) Queue: 


(c) Stack: 


Ti 

T2 


i(l) r(l) 

I-1.1- 

r(empty) 


Ti 

i(l) i(2) r(l) 

T 2 I-^■■■1-1.1-1 


Ti 


T2 


i(l) i(2) 

I-^■■■1-1. 

r(l) r(2) 

I-^ I- 


History (a) is already not locally linearizable wrt pool, queue, and stack, respectively, 
histories (b) and (c) provide interesting examples. The history in Figure is locally lin¬ 
earizable but not sequentially consistent wrt a pool. The following histories are locally 
linearizable but not sequentially consistent wrt a queue and a stack, respectively: 

(d) Queue: 


i(l) i(2) i(3) r(l) 

Ti ---I - ^---1 - ^--1 - ^---1 - 1 > 

r(2) i(4) r(4) r(3) 

T 2 .1-^ I-l-l-> 

The two thread-induced histories i(l)i(2)i(3)r(l)r(2)r(3) and i(4)r(4) are both lineariz¬ 
able with respect to a queue. However, the overall history has no sequential witness and is 
therefore not sequentially consistent: To maintain the queue behavior, the order of opera¬ 
tions r(l) and r(2) cannot be changed. However, this implies that the value 3 instead of 
the value 4 would have to be removed directly after i(4). 


(e) Stack: 
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q* = ins(a;i4)rem(a;i,i) ... ins(a;i,p)rem(a;i,p)rem(a;i,p+i)rem(a;i^,)rem(empty)''*ins(2:j,g+i) .. . ins(xi,m). 
M Figure 6 Sequential history q^. 


Ti 

T2 


i(l) i(2) 

I-^■■■1-1. 

r(2) i(3) r(l) r(3) 

.I-^ -l-^ I-l- 


The two thread-induced histories i(l)i(2)r(2)r(l) and i(3)r(3) are both linearizable with 
respect to a stack. The operations i(2) and r(2) prevent the reordering of operations 
i(l) and i(3). Therefore, the overall history has no sequential witness and hence it is not 
sequentially consistent. ◄ 

Proposition (Pool, QC). Let h be a pool history in which no data is duplicated, no 
thin-air values are returned, and no data is lost, i.e., h satisfies 1.-3. of Proposition^ 
Then h is quiescently consistent. 

Proof. Assume h is a pool history that satisfies 1.-3. of Proposition]^ Let hi,...,h„ 
be histories that form a sequential decomposition of h. That is h = hi • • • h„ and the 
only quiescent states in any are at the beginning and at the end of it. Note that this 
decomposition has nothing to do with a thread-local decomposition. Let = Mh, be the 
set of methods of h^, for i G {1, ... ,n}. Note that the sanity conditions 1.-3. ensure that 
none of the following two situations can happen: 

H rem(a;) G M,,ins(a:) € Mj,j > i, 

H ins(a;) G rem(empty) G Afj,rem(a;) G Mk,k > j > i, 

Let Vi = {xi^i ,..., Xi^m} denote the set of values in Mi ordered in a way that there is a p 
and q such that 


H ins(a;ij), rem(a;i_j) G Mi for j < p; 

H rem(a;ij ) G Mi for j > p,j < q; and 
H ins(a;ij ) G Mi for j > q. 

Moreover, let be the number of occurrences of rem(empty) in h^. 

We now construct a sequential history for h, which has the form q = qi • • • qn where 
each sequential history q^ is a permutation of Mi shown in Figure]^ Using the observations 
above, it is easy to check that q is indeed a quiescent witness for h. ◄ 


Theorem |16| (Pool, Queue, & Stack, QC). For pool, local linearizability is stronger than 
quiescent consistency. For queue and stack, local linearizability is incomparable to quiescent 
consistency. 

Proof. The following histories are quiescently consistent but not locally linearizable wrt 
pool, queue, and stack, respectively: 

(a) Pool: 


rem(empty) 


ins(l) rem(empty) rem(l) 

T2 . 1 - ^■■■1 - ^■■■1 - 1 .> 
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(b) Queue: 


(c) Stack: 


Ti 

T2 


enq(l) 

- 1 ■■> 

enq(2) enq(3) deq(3) deq(2) 

-^ I-I--1.> 


^ push(l) 

Ti ■■■ I-1 ■ ■> 

push(2) push(3) pop(2) pop(3) 

T 2 .1-^■■■1-^ --l-I-1.> 


In all three histories, the only quiescent states are before and after the longest operation. 
Therefore, all operations in thread T 2 can be reordered arbitrarily, in particular in a way 
such that they satisfy the sequential specification of the respective concurrent data struc¬ 
ture. However, each of the thread-induced histories for thread T 2 are not linearizable with 
respect to pool, queue, and stack, respectively. Therefore, none of these histories is locally 
linearizable. Also here history (a) suffices. 

On the other hand, the following histories are not quiescently consistent but locally 
linearizable wrt queue, and stack, respectively: 

(d) Queue: 


enq(l) deq(2) 

Ti -I-1.1-1.> 

enq(2) deq(l) 

T 2 I-1.i-1 > 

(e) Stack: 

push(l) pop(l) 

Ti -I-1.1-1.> 

push(2) pop(2) 

T 2 I-1.i-1 > 

In histories (d) and (e), between each two operations, the concurrent data structure 
is in a quiescent state. Therefore, none of the operations can be reordered and, hence, no 
sequential witness exists. However, all thread-induced histories are linearable and, therefore, 
the overall histories are locally linearizable. In particular, on a history where each pair of 
operations is separated by a quiescent state, i.e., there is no overlap of operations, a quiescent 
consistent data structure behaves as it would be linearizable with respect to its sequential 
specification and we see the same semantic differences to local linearizability as we see 
between linearizability and local linearizability. ◄ 


Consider a data structure D which admits two operation types: ins(a::), which inserts the 
element x into the container, and rem(), which returns and removes an element from the 
container. Now imagine that the implementation uses a Work Stealing Queue (WSQ) p?5] . 
Every thread T that uses D has its unique designated buffer Qt in the WSQ. Whenever 
thread T calls ins(a:), x is appended to the tail of Qt- When T calls rem(), WSQ first 


C Case Study: Work Stealing Queues 
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enq(3) 

Ti ■■■I - 1 . ■> 

enq(2) enq(l) deq(l) deq(2) deq(3) 

T 2 I-^---1-1 -I--1--> 


Figure 7 History that is QQC but not LL. 


checks whether Qt is non-empty; if it is, then it returns the element at the tail of Qt (LIFO 
semantics) and removes it. Otherwise, it chooses some other Qt' and tries to return an 
element from that buffer. But any time a different thread’s buffer is checked, the element to 
be removed is taken from the head (FIFO semantics). If T and T' are both trying to access 
the same buffer at the same time, then usual synchronization measures are taken to ensure 
that exactly one thread removes one element. 

Given this implementation, the developer of D wants to write a specification for the 
potential users of D. Since D is essentially a collection of deques, the developer is tempted 
to state that D is a deque with a particular consistency condition. However, D is not a 
linearizable deque because ins (a;) by T followed by ins(?/) by T' followed by rem() returns 
either x or y depending on whether T or T' calls it; i.e. rem() has ambiguous semantics. 
D can be seen as a sequentially consistent (SC) deque but then D does not allow many 
behaviors that an SC deque would allow; i.e. SC does not capture the behaviors of D 
tightly. Relaxed sequential specifications will not work either since D does converge to 
sequential semantics (of a LIFO stack) when a single thread uses it. In short, the developer 
will fail to capture the semantics of Z? in a satisfactory manner. 

D on the other hand is a locally linearizable deque in which rem() by T from Qt' is 
treated as FIFO removal whenever T ^ T' and as LIFO removal whenever T = T'. In other 
words, local linearizability provides a succinct and clean representation of a well-known 
implementation framework (WSQ) hiding away implementation details. Compare this with 
the fact that even though WSQ has a queue in it, to argue its correctness it is proved to be 
a linearizable pool even though it has stronger semantics than a pool; i.e. linearizable pool 
semantics is too weak for D. Observe also that since what we have described in the example 
is essentially providing the illusion of using a monolithic structure which is implemented 
in terms of distributed components (shared memory is typically implemented on message 
passing), we expect local linearizability to be widely applicable. 

D I Quiescent Consistency &. Quantitative Quiescent Consistency 

Without going into the details of the definition of quantitative quiescent consistency we give 
a history in Figure]^ that is quantitatively quiescently consistent but not locally linearizable 
wrt a queue. Quantitative quiescent consistency allows to reorder the two insert-operations 
in thread T 2 and thereby violates local linearizability. 


E I Consistency Conditions for Distributed Shared Memory 









> 


Consistency 

Condition 

Decomposition per 

#SHs 

Write-Operations Ih(i) 

Read-Operations Oh(i) 

CCfSH 

LoD 

LL 

thread 

n 

{ins(ii) e hlTi \v£V} 

{head(ui„it) £ h} 

U {head(u) £ h | ins(?;) £ Ih(*)} 

Lin. 

no 

Coherence 

memory location 

k 

{ins(u) € h 1 u £ P} 

{head(u) £ h | a £ P} 

SC 

yes 

PRAM 

thread 

n 

{ins(ii) G h 1 a £ P} 

{head(u) £ hlTi | u £ P} 

SC 

yes 

PC 

thread 

n 

{ins(ri) £ h 1 a £ P} 

{head(u) £ hlT^ | u £ P} 

SC“ 

yes 

CC 

thread 

n 

{ins(i;) £ h 1 V £ P} 

{head(u) £ hlTi | u £ P} 

SC'’ 

yes 

LC 

thread & memory location 

n ■ k 

{ins(ii) £ hlTi | u £ P} 

U {ins(i!) £ h | head(i!) £ hlTi} 

{head(u) £ hlTi | a £ P} 

SC" 

yes 


SC“: SC and ins-operations are in the same order for each witness. 

SC^: SC and ins-operations are ordered by the transitive closure of the thread program orders and write-read pairs. 

SC°: SC and ins-operations from threads other than Ti can be reordered even if they are from the same thread and only logical contradictions 
in the local history are considered for consistency. 
n: number of threads, k: number of memory locations, 

#SHs: number of subhistories, CCfSH: consistency condition for subhistories, LoD: loss of data 
M Table 2 Comparison of consistency conditions for a single distributed shared memory location, i.e., fe = 1 
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In Table we compare local linearizability (LL) against the consistency conditions coher¬ 
ence [3], pipelined RAM (PRAM) consistency [3TJ |311I1TJ 0] , processor consistency (PC) [3J 
[TB] , causal consistency (CC) [1], and local consistency (LC) HO]. Local linearizability shares 
with all these consistency conditions the idea of decomposing a concurrent history into 
several subhistories. 

Coherence projects a concurrent history to the operations on a single memory location 
and each resulting history has to be sequentially consistent. Since sequential consistency is 
not compositional, coherence does not imply sequential consistency for the overall history [3] 
whereas local linearizability for each single memory location implies local linearizability for 
the overall history. 

In contrast to coherence and local consistency, local linearizability, PRAM consistency, 
PC, and CC all decompose the history into per-thread subhistories, i.e., if there are n threads 
then these conditions consider n subhistories and need n sequential witnesses. Coherence 
requires one witness per memory location and local consistency requires one witness per 
thread and memory location. 

For determining the subhistory for a thread Tj, coherence, PRAM consistency, PC, and 
CC consider all write-operations in a given history, i.e., Ih(0 = {ins(u) G h | u G V}. In 
contrast, local linearizability only considers the write-operations in thread T^, i.e., Ih(*) = 
{ins(i;) G h.\Ti \ v G V} and local consistency considers all write-operations in thread Ti as 
well as all write-operations whose values are read in thread Ti, i.e., Ih(*) = {ins(u) G h|Ti | 
G P} U {ins(u) G h | head(u) G h|Ti}. Regarding read-operations, PRAM consistency, 
PC, CC, and LC consider only the read-operations in thread Ti. Coherence considers all 
read-operations in a given history and local linearizability only considers read-operations 
that read the initial value Vinit and read-operations that read values that were written by a 
write-operation in thread Ti. Reading the initial value is analogous to returning empty in a 
data structure. 

Local linearizability requires that each subhistory, i.e., thread-induced history, is lineariz- 
able with respect to the sequential specification under consideration. In contrast, coherence, 
PRAM consistency, PC, CC, and LC require that each subhistory is sequentially consistent 
(or a variant thereof) with respect to the sequential specification. However, the variants 
of sequential consistency that are used by these consistency conditions are vulnerable to 
a loss of data as discussed in Section and, therefore, make these consistency conditions 
unsuitable for concurrent data structures. 

When considering PRAM consistency, the sequentialization of the write-operations of dif¬ 
ferent threads might be observed differently by different threads, e.g., a thread Ti might ob¬ 
serve all write operations of thread T 2 before the write operations of thread T^ but a thread T 4 
might observe all write operations of T 3 before the write operations of T 2 . In contrast, thread- 
induced histories as defined by local linearizability do not involve write-operations from other 
threads but involve (some) read-operations performed by other threads. Like PRAM consist¬ 
ency, processor consistency requires for each thread Ti that the read- and write-operations 
performed by Ti are seen in Ti’s program order and that the write-operations performed by 
other threads are seen in their respective program order. Furthermore, processor consistency 
also requires that two write-operations to the same memory location appear in the same or¬ 
der in each sequential witness of each thread even if they are from different threads mm- 
This additional condition makes processor consistency strictly stronger than PRAM consist¬ 
ency [3] . This condition also creates a similar effect as the consideration of read-operations 
in different threads when forming the thread-induced history in local linearizability. Causal 
consistency considers a causal order instead of the thread program orders alone. Like local 
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£ = 4, t! = 4 

^ — 1 max 


’ = 4,v = 4 

e-1 


! — 5^ V — 5 
£ - 1 




(a) Initial state (b) Add new node (c) Adjust £, then v 

Figure 8 Segment modifications throughout amiouiice_thread(). 


linearizability, causal consistency matches write-read pairs across different threads. In par¬ 
ticular, the causal order is the transitive closure of the thread program orders and write-read 
pairs. By considering the causal order, writes from different threads can become ordered 
which is not the case for local linearizability. 


m LLD and LL+D I mplementation Details 

As already mentioned, each thread inserts elements into a local backend and removes ele¬ 
ments either from its local backend (preferred) or from other backends (fall-back) accessed 
through a single segment (thread-indexed array), effectively managing single-producer/multiple- 
consumer backends for a varying number of threads. 

The segment is dynamic in length (with a predefined maximum). A slot in this segment 
refers to a node that consists of a backend and a flag indicating whether the correspond¬ 
ing thread is alive or has terminated. Similar to other work [3 HU the flag is used for 
logically removing the node from the segment (it stays in the segment until its backend is 
empty). Additionally, a (global) version number keeps track of all changes in the segment. 
The algorithm is divided into two parts: (1) maintaining the segment, and (2) adding and 
removing elements to backends. 

In the following we refer to the segment as s, a thread’s local node as Ui, the version 
number of the segment as v and the current length of the segment as £. The range of indices 
r is then defined as 0 < r < £. 

For maintaining the segment we provide two methods announce_thread() and 
cleanup_thread(node) that are used to add and remove nodes to the segment. Upon 
removal of a node the segment is also compacted, i.e., the hole that is created by removing 
a node pointer is filled with the last node pointer in the segment. As nodes are added and 
removed the length of the segment i and thus the range of valid indices i of the segment, 

0 < f < £, is updated. All changes to the segment involve incrementing the version number. 

More detailed, the operations for maintaining the segment and compacting it as nodes 
are cleaned up are: 

H announce_thread() : Allocates a node for the thread as follows: searches for an existing 
node of a terminated thread and reuses it if it finds one; otherwise it creates a new node, 
adds the node to s, and adjusts t. In both cases it then increments v and returns the 
node. The creation of new node is illustrated in Figure]^ 

H cleanup_thread(Node n) : Searches for the node n in s using linear search. If it finds n 
at slot j, it copies the pointer of s[£ — 1] to s[j], decrements £, increments v, and resets 
s)!*] to null using the new £. If n is not found, then a concurrent thread has already 
performed the cleanup and the operation just returns. Figure illustrates an example 
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£ — 4, V — 6 
£- 1 

_..L_ 


(a) n at s[0] is empty and ^ ^ 

dead 

H Figure 9 Segment modifications throughout cleanup_thread(Node n). 

where initially £ = 5, the thread owning the node at s[0] is dead and the corresponding 
backend is empty. 

Note that updating the segment state is only needed when threads are joining or when 
backends of terminated threads become empty. We consider both scenarios as infrequent and 
implement the corresponding operations using locks. Alternatively those operations can be 
implemented using helping approaches, similar to wait-free algorithms |29j . Also note that 
although operations on segments are protected by locks, partial changes can be observed, 
e.g., a remove operation (as defined below) can observe a segment in an intermediate state 
with two pointers pointing to a node during cleanup. The invariant is that no change can 
destroy the integrity of the segment within the valid range, i.e., all slots within the range 
either point to a valid node or nothing (null). 

The actual algorithm for adding and removing elements is then defined as follows: 

m ins(): Upon first insertion, a thread Ti gets assigned a node rii (containing backend bi) 
using announce_thread(). The element is then inserted into b^. Subsequent insertions 
from this thread will use rii throughout the lifetime of the thread. 

H rem(): The remove operation consists of two parts: (a) finding and removing an element 
and (b) cleaning up nodes of terminated threads. For (a) a thread Ti tries to get an 
element from its own backend in nt. If rii does not exist (because the thread has not 
yet performed a single ins() operation) or the corresponding backend is empty, then a 
different node n is selected randomly within the valid range. If the backend contained 
in n is empty, the operation scans all other nodes’ backends in linear fashion. However, 
if the version number changed during the round of scanning through all backends, the 
operation is restarted immediately. Note that since £ is dynamic a remove operation may 
operate on a range that is no longer valid. Checking the version number ensures that the 
operation is restarted in such a case. For (b) a thread calls cleanup_thread(n) upon 
encountering a node n that has its alive-flag set to false (dead) and contains an empty 
backend. A cleanup also triggers a restart of the remove operation. 

H terminate)): Upon termination a thread Ti changes the alive flag of rii to false (dead). 

Dynamic memory used for nodes is susceptible to the ABA problem and requires proper 
handling to free memory. Our implementations use 16-bit ABA counters to avoid the ABA 
problem and refrain from freeing memory. Hazard pointers |33j can be used for solving the 
ABA problem as well as for freeing memory. 

F.l LL+D: LLD with Linearizable Emptiness Check 

We call a data structure implementation d) stateful if the remove methods of can be 
modified to return a so-called state that changes upon an insert or a remove of an element. 


£ — V — b 
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but does not change between two removes that return empty unless an element has been 
inserted in the data structure in the meantime. For stateful implementations d* we can create 
the locally linearizable version with linearizable emptiness check LL+D d). Michael-Scott 
queue m and Treiber stack m are stateful implementations, whereas LCRQ m is not. 
Also TS stack [T3j, and A:-FIFO [55] and fc-Stack [^ are stateful implementations, but the 
notion of a state in these data structures is huge making it unsuitable for LL+D. 

For LL+D implementations, linearizable emptiness checks are achieved via an atomic 
snapshot just like for DQs. A detailed description of the LLD and LL+D implementa¬ 
tions, as well as the pseudo code, can be found in the appendix. Here, we only present the 
results of the experimental performance evaluation. 


F.2 Correctness of LL+D 

► Proposition 7 (LLD and LL+D). Let <!> be a stateful data structure implementation that is 
linearizable with respect to a sequential specification 5'$. Then LL+D <i> is linearizable with 
respect to a pool. 


Proof. Proving that LL+D d* is linearizable with respect to pool, in particular that it has 
a linearizable emptiness check, follows the proof for DQ in general, see [T5]: The emptiness 
check is performed by creating an atomic snapshot |5^ of the states of all backends (stored 
in the states array) using the first loop (lines 28||43 1. If the atomic snapshot is valid 
(checked via the second loop, lines 46]|^ in particular line 481 and all backends are empty 
in this atomic snapshot, then there existed a point in time during the creation of the atomic 
snapshot where all backends were indeed empty. 

Notice that since the segment is dynamic in length it can happen that some backends 
are not contained in the atomic snapshot. To guarantee that no elements are missed in the 
emptiness check the atomic snapshot is extended by the version number v of the segment. If 
a new backend is added to the segment during the generation of the atomic snapshot, then 
the version number is increased and the atomic snapshot becomes invalid (line 45). 

The linearization point of the remove operation that returns empty is inbetween the two 
loops (the last remove attempt of the first loop) if the version check and second loop go 
through. ◄ 


F.3 LLD Pseudo Code 

All implementations use the interfaces depicted in Listing [l] For simplicity, the interface 
only mentions pool, queue, and stack. The highlighted code refers to linearizable emptiness 
check, i.e., it is only part of the LL+D implementations: Methods retrieving elements (e.g. 
rem) are assumed (or modified when possible) to also return a State object that uniquely 
identifies the state of the data structure with respect to methods inserting elements (e.g. 
ins). The same state can be accessed via the get_state() observer method. 

Listingillustrates the pseudo-code for maintaining the segment. The backend on linej^ 
can either be declared as Stack or Queue as defined in Listing (or any other linearizable 
data structure). 

Listing shows the pseudo-code for LL+D. When removing the highlighted code, we 
obtain the code for LLD. Each thread maintains its own backend, enclosed in a thread-local 
node (line[^, for insertion. The local backend is always accessed through get_local_node 
(line|^. This method also makes sure that a thread is announced (line upon first in¬ 
sertion and acquires a node. An ins() operation then always uses a thread’s local backend 
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1 

Pool { 

2 

<Element , State > rem() ; 

3 

void ins(Element e); 

4 

State get_state(); 

5 

> 

6 


7 

Queue : Pool { 

8 

<Element , State > dequeue(); 

9 

void enqueue(Element e); 

10 

void ins(Element e) => enqueue(e); 

11 

<Element , State > rem() => dequeue(); 

12 

> 

13 


14 

Stack : Pool { 

15 

<Element , State > popO ; 

16 

void push(Element e) ; 

17 

void ins(Element e) => push(e); 

18 

<Element , State > rem() => popO; 

19 

> 


M Listing 1 Pool, queue, and stack interfaces 


(line 13 and 14) for insertion. For removing an element in rem(), a thread tries to remove an 
element from its local backend first (line l9]|^ l. If no element can be found, all backends 
in the valid range are searched in a linear fashion, starting from a random index. The 
highlighted code (lines 46|^l illustrates checking the atomic snapshot for LL+D. 


F.4 LLD with Observer Methods 

We have implemented LLD variants of (strict and relaxed) queue and stack implementations. 
None of our LLD implementations involves observer methods, but the LLD algorithm can 
easily be extended to support observer methods: 

H A data observer on LLD 4) (independently of which thread performs it) amounts to a 
data observer on any 4)2-. 

H A local shape observer on LLD 4) performed by thread T executes the shape observer 
on 4)7-. 

H A global shape observer on LLD 4) executes the shape observer on each backend 4)2- and 
produces an aggregate value. 


G I Additional Implementations 

We now present and evaluate additional algorithms that provide locally linearizable variants 
of queues and stacks, obtained by modifying relaxed fc-out of order queues and stacks [531 
155] in a way that makes them sequentially correct. We have also tried another generic 
implementation, related to the construction in |5|, that implements a flat-combining wrapper 
with sequential (to be precise, single-producer multiple-consumer) backends. In our initial 
experiments the performance of such an implementation was not particularly promising. 


G.l Locally Linearizable /c-FlFO Queue and /c-Stack 

fc-FIFO queues |55| and fc-Stacks [53] are relaxed queues and stacks based on lists of segments 
where each segment holds k slots for elements, effectively allowing reorderings of elements of 
up to k—1. The list of segments is implemented by a variant of Michael-Scott queue |31| for 
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1 Node { 

2 Pool backend; // Any linearizable data structure. 

3 Bool alive; 

4 } 

5 

6 Segment { 

7 Node nodes[MAX_THREADS]; 

8 Int 1 = 0; 

9 Int version = 0; 

10 

11 // Returns all indexes between 0 and 1 (exclusive) in random order. 

12 [ Int] range () ; 

13 

14 // Announces a node in the buffer, effectively adding it to nodes_, 

15 // adjusting 1, and changing the version. 

16 Node announce_thread () { 

17 segment_lock(); // Protecting against concurrent announce or cleanup operations. 

18 Node n = find_dead_node () ; 

19 i f n == null { 

20 n = NodeCb: BackendO); 

21 nodes [1] = n; 

22 1 + +; 

23 > 

24 n.alive = true; 

25 version++; 

26 segment_unlock () ; 

27 return n; 

28 } 

29 

30 // Removes a node from the buffer, effectively removing it from nodes_, 

31 // adjusting 1, and changing the version. 

32 void cleanup_thread(Node n, Int old_version) { 

33 segment_lock(); // Protecting against concurrent announce or cleanup operations. 

34 <j, error> = find_node_in_segment(n); 

35 if error I I n.alive I I old_version != version { 

36 segment_unlock(); 

37 return; 

38 } 

39 nodes [j] = nodes[1-1]; 

40 1 - -; 

41 version++; 

42 nodes [1] = null; 

43 segment_unlock () ; 

44 } 

45 > 

U Listing 2 Node and segment structure for LLD and LL^D (queue or stack) 


fc-FIFO and a variant of Treiber stack m for fc-Stack. Insert and remove methods operate 
on the segments ignoring any order of elements within the same segment. Segments used 
for insertion and removal are identified by insertion and removal pointers, respectively. 

For queues, elements are removed from the oldest segment and inserted into the most- 
recent not-full segment. Upon trying to remove an element from an empty segment the 
segment is removed and the removal pointer advanced to the next segment. Upon trying to 
insert an element into a full segment a new segment is appended and the insertion pointer 
is advanced to this new segment. Similarly (but different) for a stack, removal and insertion 
operate on the most-recent segment, i.e., removal and insertion pointer are synonyms and 
identify the same segment at all times. Again, upon trying to remove an element from 
an empty segment the segment is removed and the removal pointer advanced to the next 
segment. Upon trying to insert an element into a full segment a new segment is prepended 
and the insertion pointer is set to this new segment. 

/c-FIFO queues and /c-Stacks are relaxed queues and stacks that are: (1) linearizable with 
respect to A:-out-of-order queue and stack m, respectively; (2) linearizable with respect to 
a pool 1131HH]; (3) not locally linearizable with respect to queue and stack, respectively, for 
fc > 1 since reordering elements that are inserted in the same segment (even sequentially by 
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1 DynamicLocallyLinearizableDQ { 

2 Segment s ; 

3 thread_local Node local_node ; 

4 

5 Node get_local_node(Bool create_if_absent) { 

6 if (create_if_absent) && (local_node == null) { 

7 local_node = s.announce_thread(); 

8 > 

9 return local_node; 

10 > 

11 

12 void ins(Element e) { 

13 n = get_local_node(create_if_absent: true); 

14 n.backend.ins(e); 

15 > 

16 

17 Element rem() ■( 

18 // Fast path of retrieving an element from the thread-local backend. 

19 n = get_local_node(create_if_absent: false); 

20 i f n != null { 

21 <e , state > = n.backend.rem(); 

22 if e != null { return e; } 

23 } 

24 while true { 

25 retry = false ; 

26 old_version = s.version; 

27 range = s.range (); 

28 for i in range { 

29 n = s.nodes [i] ; 

30 if old_version != s.version { 

31 retry = true ; break ; } 

32 Bool alive = n.alive; 

33 <e , state> = n.backend.rem(); 

34 if e == null { 

35 states [i] = state; 

36 if! alive { 

37 s.cleanup_thread(n , old_version) ; 

38 retry = true ; break ; 

39 > 

40 } e 1 s e { 

41 returne; 

42 } 

43 > 

44 if retry { continue; } 

45 if old_version != s.version { continue; }■ 

46 for i in range { 

47 n = s.nodes [i] ; 

48 if n == null II n.backend.get_state() 1= states [i] { 

49 retry = true; 

50 break ; 

51 > 

52 } 

53 if retry { continue; } 

54 

55 

56 

57 

58 

59 

60 
61 
62 

63 

64 

M Listing 3 LLD and LL’''D (queue and stack) 


return null; // Empty case. 

> 

> 

// Called upon thread termination, 
void terminate () { 

n = get_local_node(create_if_absent: false); 
if n 1= null { n.alive = false; } 

> 

> 


a single thread) is allowed, see the histories (b) and (c) in the proof of Theorem 15 and (4) 
not sequentially consistent with respect to queue and stack, as shown by the histories (d) 
and (e) in the proof of Theorem 15 that are fc-FIFO and /c-Stack histories, respectively, for 


k>l. 
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We now present LL A:-FIFO and LL fc-Stack, modifications of A:-FIFO and fc-Stack, that 
enforce local linearizability by ensuring that no thread inserts more than once in a single 
segment. Assuming that segments are unique (by tagging pointers), LL fc-FIFO remembers 
the last used insertion pointer per thread. For LL A:-Stack the situation is more subtle 
as (due to the stack semantics) segments can be reached multiple times for insertion and 
removal. Figure illustrates an example where the top segment of a fc-Stack is reached 
multiple times by the same thread (Ti). Since in the general case all segments could be 
reached multiple times by a single thread it is required to maintain the full history of each 
thread’s insertions. Assuming the maximum number of threads is known in advance, a 
bitmap is used to maintain the information in which segment a thread has already pushed a 
value. One can similarly implement a locally linearizable version of the Segment Queue [2]. 


1 iiis(l) 


ins(3) 



ins(4) , 

1 

1 

1 

ins (2) 


rem(3) 

rem(2) 

1 

1 

1 

1 . 1.1 1.1 . .' r 1 1 1.1.1 ■ 


Tz 



M Figure 10 LL fc-Stack run (fe = 2). Ti can only insert in uncolored segments and needs to 
prepend a new segment (for insertion) otherwise. 


G.1.1 fc-FlFO Queue and LL fc-FlFO Queue Pseudo Code. 

Listing 1^ shows the pseudo code for LL fc-FIFO queue. Again we highlight the code we 
added to the original pseudo code m- Similar to the locally linearizable fc-Stack each 
thread inserts at most one element into a segment. However, in the k-FIFO queue we do 
not need flags in each segment to achieve this property. It is sufficient to remember the last 
segment used for insertion for each thread (set_last_tail; line 151. For each enqueue the 
algorithm checks whether the executing thread has already used this segment for enqueueing 
an element (get_last_tail; line|^. If the segment has already been used, the thread tries 
to append a new segment (effectively adding a new tail). 


G.1.2 Correctness Proof of LL fc-FIFO Queue. 

Having Theorem the proof of correctness of LL fc-FIFO queue is easy. 

► Theorem 21 (Correctness of LL fc-FIFO). LL k-FLFO queue presented in Listing^is locally 
linearizable. 

Proof. Using Theorem as a first proof obligation we have to show that any history h of 
the LL fc-FIFO queue is locally linearizable with respect to the pool sequential specification 
Sp. This proof is analogous to the proof that any history of the LL fc-Stack is locally 
linearizable with respect to the pool sequential specification Sp, and is therefore postponed 
until the corresponding LL fc-Stack theorem. 

What remains to show is that 

yx,y G V. Vi. enq(a:) <(, enq(i/) A deq(y) G h => deq(a:) G h A deq(i/) deq(x) 
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1 LocallyLinearizableKFIFOQueue { 

2 enqueue(item) : 

3 while true : 

4 tail_old = get_tail(); 

5 if get_last_tai1(thread_id) == tail_old: 

6 advance_tail(tail_old, k); 

7 continue; // Restart while loop. 

8 head_old = get_head(); 

9 item_old, index = find_empty_slot(tail_old, k); 

10 if tail_old == get_tail(): 

11 if item_old.value == EMPTY: 

12 item_new = atomic_value(item, item_old.version + 1); 

13 if CAS(&tail_old->segment [index] , item_old, item_new) : 

14 if committed(tail_old, item_new , index): 

15 set_last_tai1(thread^id, tail_old); 

16 

17 

18 

19 

20 
21 
22 

23 

24 

25 

26 

27 

28 

29 

30 

31 

32 

33 

34 

35 

36 

37 

38 

39 

40 

41 

42 

43 

44 

45 

46 

47 

48 

49 

50 

51 

52 

53 

54 

55 

U Listing 4 Locally Linearizable fc-FIFO Queue 


return true ; 

else: 

advance_tail(tail_old, k); 

bool committed(tail_old, item_new, index): 
if tail_old->segment [index] != item_new: 

return true ; 

head_current = get_head(); 
tail_current = get_tail(); 

item_empty = atomic_value(EMPTY, item_new.version + 1); 
if in_queue_after_head(tail_old, tail_current, head_current): 
return true ; 

else if not_in_queue(tail_old, tail_current, head_current): 
if !CAS(&tail_old->segment[index], item_new, item_empty): 
return true ; 
else : //in queue at head 

head_new = atomic_value(head_current.value, head_current.version + 1); 
if CAS(&head, head_current, head_new): 
return true ; 

if !CAS(&tail_old->segment [index] , item_new , item_empty) : 
return true ; 
return false; 

item dequeue(): 
while true : 

head_old = get_head(); 

item_old, index = find_item(head_old, k); 
tail_old = get_tail(); 
if head_old == get_head(): 
if item_old.value != EMPTY: 

if head_old.value == tail_old.value: 
advance_tail(tail_old, k); 

item_empty = atomic_value(EMPTY, item_old.version + 1); 
if CAS(&head_old [index] , item_old , item_empty) : 
return item_old . value ; 
else: 

if head_old.value == tail_old.value && tail_old.value == get_tail(): 
return null ; 

advance_head(head_old, k); 


Assume enq(x) enq(?/). This means that x and y were enqueued by the same thread 
i and therefore inserted into different segments. Moreover, the segment of x is closer to 
the head of the list than the segment of y. A deq(?/) method call can remove y only if the 
segment of y is the head segment. The segment of y can only become the head segment if 
all segments closer to the head of the list get empty. This means that also the segment of x 
has to become empty. Therefore there has to exist a deq(a;) method call which removes x 
from the segment, and deq(a;) ylh deq(y). 
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G.1.3 /c-Stack and LL /c-Stack Pseudo Code. 


Listing shows the pseudo code for LL A:-Stack. The highlighted code is the code we 
added to the original pseudo code m to achieve local linearizability. The difference to 
the original algorithm is that a thread inserts at most one element into a segment. To 
achieve this property each segment in the k-stack contains a ffag per thread which is set 
when an element is inserted into the segment (mark_segment_as_used; line 14 and line 601. 
If a thread encounters a segment where its ffag is already set, the thread does not insert 
its element into that segment but tries to prepend a new segment (is_segment_marked; 
line 501. Otherwise the element is inserted into the existing segment and the ffag of the 
thread in that segment is set. 


1 LocallyLinearizableKStack { 

2 SegmentPtr top ; 

3 

4 void init () ; 

5 nGW_ksegment = calloc ( s i zeof(ksegment) ) ; 

6 top = atomic_valuG(nGW_ksGgmGnt, 0); 

7 

8 bool try_add_new_ksegmGnt (top_old , item): 

9 if top_old == top: 

10 nGW_ksegmGnt = calloc(siz e o f(ksegment)); 

11 new_ksegment->next = top_old; 

12 new_ksegment->s [0] = atomic_value(item, 0); // Use first slot for item. 

13 top_new = atomic_value(new_ksegment, top_old.ver+1); 

14 mark_segment_as_used(top_new); 

15 if CAS (&top, top_old , top_new): 

16 return true ; 

17 return false; 

18 

19 void try_remove_ksegment(top_old): 

20 if top_old == top: 

21 if top_old->next != null: 

22 atomic_increment(&top_old->remove); 

23 if empty(top_old): 

24 top_new = atomic_value(top_old->next, top_old.ver+1); 

25 if CAS (&top, top_old , top_new): 

26 return; 

27 atomic_decrement(&top_old->remove); 

28 

29 bool committed(top_old , item_new, index): 

30 if top_old->s[index] != item_new: 

31 return true ; 

32 else if top_old->remove == 0: 

33 return true; 

34 else : //top_old->remove >= 1 

35 item_empty = atomic_value(EMPTY, item_new.ver+1); 

36 if top_old != top: 

37 if !CAS(&top_old->s [index] , item_new, item_empty): 

38 return true; 

39 else: 

40 top_new = atomic_value(top_old.val, top_old.ver+1); 

41 if CAS (&top, top_old , top_new): 

42 return true; 

43 if !CAS(&top_old->s [index] , item_new, item_empty): 

44 return true; 

45 return false; 

46 

47 void push(item): 

48 while true : 

49 top_old = top; 

50 if segment_is_marked(top_old): 

51 if try_add_new_ksegment(top_old, item); 

52 return true; 

53 continue; // Restart while loop. 

54 item_old , index = find_empty_siot(top_old); 

55 if top_old == top: 

56 if item_old.val == EMPTY: 

57 item_new = atomic_value(item, item_old.ver+1); 

58 if CAS(&top_old->s [index] , item_old, item_new) : 

59 if committed(top_old, item_new, index): 





XX:34 Local Linearizability 


60 


mark_segment_as_used(old_top); 


61 

62 

63 

64 

65 

66 

67 

68 

69 

70 

71 

72 

73 

74 

75 

76 

77 

78 

79 

80 
81 
82 


return true ; 

else: 

if try_add_new_ksegment(top_old , item): 
return true ; 

item pop () : 
while true : 

top_old = top; 

item_old, index = find_item(top_old); 
if top_old == top: 

if item_old.val != EMPTY: 

item_empty = atomic_value(EMPTY, item_old.ver+1); 
if CAS(&top_old->s [index] , item_old , item_empty) : 
return item_old . val ; 
else: 

if only_ksegment(top_old): 
if empty(top_old): 
if top_old == top: 
return null ; 

else: 

try_remove_ksegment(top_old); 


M Listing 5 Locally Linearizable fc-Stack 


G.1.4 Correctness Proof of LL fc-Stack. 

The local linearizability proof of LL fc-Stack is more involved, but very interesting. We use 
a theorem from the published artifact of m, which has been mechanically proved in the 
Isabelle HOL theorem prover. 

► Theorem 22 (Empty Returns for Stack). Let h &e a history, and let h' be the projection of 
h to E \ pop{empty). //h is linearizable with respect to the sequential specification Sp of a 
pool (see Definition^, and h' is linearizable with respect to the sequential specification Ss 
of a stack (see Definition^, then h is linearizable with respect to Ss- 

Proof. Here we repeat the key insights of the proof and leave out technical details. A 
complete and mechanized version of the proof is available in the published artifact of [Tl]. 

As h is linearizable with respect to Sp, and h' is linearizable with respect to Ss, there 
exists a sequential history s G Sp such that s is a linearization of h, and there exists 
a sequential history s' G Ss such that s' is a linearization of h'. We show that we can 
construct a sequential history t G Ss such that t is a linearization of h. 

The linearization t is constructed as follows: the position of pop(empty) in s is pre¬ 
served in t. This means for any method call m G s that if pop(empty) zn, then also 
pop(empty) zn, and if m -<s pop(enipty), then also m -it pop(einpty). Moreover, if two 
method calls m,n Gs are ordered as m —ig pop(empty) —(g n and therefore by transitivity it 
holds that m -<s n, then also m n. 

For all other method calls the order of s' is preserved. This means for any two method 
calls m,n G s with m -<s' n, that if for all pop(empty) it holds that pop(empty) ^g m if and 
only if pop(empty) ^g n, then m —(t n. 

By construction, the history t is sequential and a permutation of h. Next we show that 
t is a linearization of h by showing that t preserves the precedence order of h. Also by 
construction, it holds that if m ’T' for any two method calls m,n G t, then also either 
m <s n or m ^g/ n. Both s and s' are linearizations of h and h', respectively. Therefore 
it cannot be for any m, n with m <h n that n —(g m or n -<s' rn, it can also not be that 
n —(t m. Since t is sequential, this means that t preserves the precedence order of h. 

Next we show that t G Sp according to Definition 
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(1) Every method call, but pop(empty), appears in s at most once: This is guaranteed since 
t is a permutation of s, and s G Sp. 

(2) If pop(a:) appears in t, then also push(x) does and push(a:) pop(a:): again, since 
t is a permutation of s and s G Sp, if pop(a:) G t, then also push(a:) G t. Since 
push(a:) pop(a^) and push(a;) ^s' pop(a:) (because both s and s' are in Sp) it also 
holds that push(a;) -(t pop(a:), as we argued already above. 

(3) Vx G V. push(x) pop(empty) ^ pop(a;) pop(empty): this property is satisfied 
trivially as all pop(empty) operations are ordered the same in t as in s, and s G Sp. 

It only remains to check that all elements are removed in a stack fashion. We have to 
show the following: 

Sx,y G V. push(x) push(j/) pop(2;) => pop(j/) G t A pop(y) pop(®) 

First we show that if push(x) push(j/) pop(3:^)5 then also push(a;) ^s' push(?/) 
pop(a:). We do this by showing that there cannot exist a pop(empty) such that push(a;) 
pop(empty) push(j/) or push(y) pop(empty) pop(a::)- 

Assume, towards a contradiction, push(x) pop(einpty) push( 2 /). By the trans¬ 
itivity of this implies that push(a:) pop(einpty) pop(3:^)5 which contradicts our 
observation above that t G Sp. Therefore push(x) pop(einpty) -<t pu.sb.(y) is not pos¬ 
sible, and for the same reason also push(j/) pop(einpty) pop(3:^) is not possible. 

Now, as s' G Ss and push(x) push(?/) ^g/ pop(a::), there has to exist a pop( 2 /) G s' with 
pop(j/) ^g/ pop(a;). For the same reason as above it cannot be that pop(x) ^g pop(empty) ^g 
pop(j/). Therefore pop(?/) and pop(a::) are ordered in t the same as in s', i.e. pop{y) pop(a;), 
and therefore t G ^s. 

-4 


► Theorem 23 (Correctness of LL /c-Stack). The LL k-Stack algorithm presented in Listing^ 
is loeally linearizahle. 

Proof. We have to show that every history h of LL fc-Stack is locally linearizable with 
respect to the sequential specification Ss defined in Definition This means that we have 
to show that every thread-induced history of h is linearizable with respect to Ss for any 
thread i. 

Having Theorem we only have to show that is linearizable with respect to the 
sequential specification Sp of a pool (defined in Definition |^, and that h'^, the projection 
of hi to S \ pop(empty), is linearizable with respect to the sequential specification Ss of a 
stack. 

We start with the proof that is linearizable with respect to Sp. We construct a 
sequential history from by identifying the linearization points of the push and pop 
method calls of the LL fc-Stack. This means that two method calls m,n are ordered in s^, 
m -<si n if the linearization point of m is executed before the linearization point of n in hj. 

The linearization point of push method calls is either the successful insertion of a new 
segment in line [T^ or the last successful CAS which writes the element into a segment slot 
in line The linearization point of pop method calls is the successful CAS which removes 
an element from its segment slot in line |73| 

For the linearization point of pop(empty) we take the linearization point of the call to 
empty in line 77 The empty method creates an atomic snapshot m of the top segment. 


This atomic snapshot is the state of the top segment at some point (i.e. linearization point 
of empty) within the execution of empty. If empty returns true, then there exists no element 
in the atomic snapshot of the segment. 
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Next we show that is in Sp as defined in Definition 


(1) Since there exists exactly one linearization point per method call, every method call, 
but rem(empty), appears in at most once. 

(2) If pop(a:) appears in s^, then it reads x in a slot of the top segment before its linearization 
point. Since only push method calls write their elements into segment slots, there has to 
exist a push(x) which wrote x into that slot. Therefore the linearization point of push(x) 
is always before the linearization point of pop(x), and therefore push(x) pop(3^)- 

(3) Segments are only removed from the list of segments when they become empty. The call 
to committed guarantees that elements are not inserted into segments which are about 
to be removed. 

A pop method calls empty only if there is a single segment left in the LL fc-Stack and 
no element was found in that segment in find_item. 

Now assume a push(x) method call inserts an element x which is missed by find_item. 
If push(x) wrote x into a segment before the linearization point of pop(empty) and 
the segment was not the last segment, then the top segment changed since pop (empty) 
searched for an element and therefore the check in line|^would fail. If push(x) wrote x 
into the last segment of the LL fc-Stack, then a pop(x) method call removed x from the 
segment because otherwise x would be in the atomic snapshot of empty and therefore 
empty would return false. Therefore, if push(x) -<si pop(empty), then also pop(x) 
pop(empty). 


Therefore is in the sequential specification Sp of a. pool. 

Next we show that h!i is linearizable with respect to Ss- We construct again a sequential 
history s'i from h!i by identifying the linearization points of the push and pop method calls 
of LL fc-Stack. 

The linearization point of the push operations is the successful insertion of a new segment 
in line if it is executed, or the reading of the empty slot (line [54| in the last (and therefore 
successful) iteration of the main loop. The linearization point of a pop operation is the 
reading of a non-empty slot (line 69) in the last (and therefore successful) iteration of the 
main loop. There do not exist any pop(empty) method calls in Since we assume a 
sequentially consistent memory model, these read operations define a total order on the LL 
fc-Stack method calls in h'^. 

First we show that s'i is in the sequential specification Sp oi a pool as defined in Defin¬ 
ition 1^ 


(1) Since there exists exactly one linearization point per method call, every method call 
appears in at most once. 

(2) If pop(x) appears in s'j, then it read x in a slot of the top segment at its linearization 
point. Since only push operations write their elements into segment slots, there has 
to exist a push(x) which wrote x into that slot. The linearization point of push(x) is 
always before x is written into a segment slot. Therefore push(x) ^gq pop(x). 

(3) Since there exist no pop(empty) operations in the third pool condition is trivially 
correct. 

Next we show that s'i also provides a stack order, which means that we have to show 
that 


Vx,j/ G V. push(x) push(j/) pop(x) => pop(y) G s'i A pop(y) ^g/. pop(x). 


We start by observing some invariants. 
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1. A thread never inserts elements into the same segment twice. This is guaranteed by the 
call to segment_is_marked. 

2. Between the linearization point of a push and the time it writes its element into a segment 
the segment the element gets written into is not removed: if the push operation inserts 
a new segment this is trivially correct. If the push operation writes the element into an 
existing segment, then the call to committed in line |59| guarantees that the segment was 
not removed. 

3. At the time of the linearization point of the pop, which is the time when the pop reads 
the non-empty slot (line [6^ in the last (and therefore successful) iteration, the pop reads 


the non-empty slot from the top segment. This is guaranteed by the check in line 70 


Now assume there exist the operations push(a:), push(?/) and pop(a;) in and 
push(a:) ^s'i push(?/) pop(a;). Since push(a:) and push(j/) are both in s'i, this means 
that both operations are executed by the same thread. Therefore, according to Invariant I., 
X and y get inserted into different segments, with the segment y on top of the segment of x. 

The linearization point of pop(a:) cannot be before y is written into its segment because 
according to Invariant 2. the segment y gets inserted into does not get removed between the 
linearization point of pnsb.(y) and the time y is written into the segment. With Invariant 
3. this means that x is unaccessible for pop(a:) before y gets written into a segment. Also 
because of the third invariant the top segment changes between the insertion of y and the 
linearization point of pop(a:). 

Next we observe that as long as y is not removed, no segment below the segment of y can 
become the top segment. Therefore for the segment of x to become the top segment so that 
pop(a:) can remove it, y has to be removed first. Only a pop(j/) can remove y, and therefore 
there exists a pop( 2 /) and the linearization point of pop(y) is before the linearization point 
of pop(a:). 

Hence si is in the sequential specification of a stack. Using Theorem 22 this means that 
LL fc-Stack in listing is locally-linearizable with respect to the sequential specification of 
a stack. 


I H I Additional Experiments 

We also evaluate the implementations on another Seal workload, the sequential alternating 
workload. However, we note that in this workload in the locally linearizable implementations 
threads only access their local backends, so no wonder they perform perfectly well. 

Mixed Workload. In order to evaluate the performance and scalability of mixed work¬ 
loads, i.e., workloads where threads produce and consume values, we exercise the so-called 
sequential alternating workload in Seal. Each thread is configured to execute 10® pairs of 
insert and remove operations, i.e., each insert operation is followed by a remove operation. 
As in the producer-consumer workload, the contention is controlled by adding a busy wait 
of 5/rs. The number of threads is configured to range between 1 and 80. Again we report 
the number of data structure operations per second. 

Data structures that require parameters to be set are configured like in the producer- 
consumer benchmark. Figure [m shows the results of the mixed workload benchmark for all 
considered data structures. 

The MS queue and Treiber stack do not perform and scale for more than 10 threads. 
As in the producer-consumer benchmark, LCRQ and TS Stack either perform competitively 
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with their fc-out-of-order relaxed counter parts fc-FIFO and fc-Stack or even outperform and 
outscale them (in the case of LCRQ, that even outperforms the pool). 

LL+D MS queue, LLD LCRQ, and LL+D Treiber stack perform very well and scale 
(nearly) linearly in the number of threads. A surprising result is that LLD fc-FIFO performs 
poorly in this experiment. The reason is that fc-FIFO performs poorly when it is almost 
empty, and in this experiment each backend instance of LLD fc-FIFO contains at most one 
element at any point in time. The fc-Stack performs better on a nearly-empty state. The 
benefit of trying to perform a local operation first in the LLD algorithms is visible when 
comparing to 1-RA DQ and DS that do not utilize a local fast path. 

m Verifying Local Linearizability 

In general, verifying local linearizability amounts to verifying linearizability for a set of 
smaller histories. This might enable verification in a modular/compositional way. Aside 
from this, it is important to mention (again) that for our locally linearizable data structures 
in Sectionj^built from linearizable building blocks, the correctness proofs are straightforward 
assuming the building blocks are proven to be linearizable. In addition, for queue we can 
state an “axiomatic" verification theorem for local linearizability in the style of 1211 uni, 
whose main theorem we recall next (with a slight reformulation). 

► Theorem 24 (Queue Linearizability). A queue concurrent history h is linearizable wrt the 
queue sequential specification Sq if and only if 

1. h is linearizable wrt the pool sequential specification Sp (with suitable renaming of method 
calls), and 

2. \lx,y G V. enq(x) <h e.nq{y) A deq{y) S h deq{x) G h A deq{y) ylh deq{x). ◄ 

We note that an analogous change to the axioms in the sequential specification of a pool 
and a stack does not lead to a characterisation of linearizability for pools and stacks, cf. m- 
An axiomatic characterisation of linearizability for pools and stacks would involve an infinite 
number of axioms/infinite axioms, due to the need to prohibit infinitely many problematic 
shapes, cf. [7j. 

We are now able to state the queue-local-linearizability-verification result. 
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► Theorem 25 (Queue Local Linearizability). A queue concurrent history h is locally linear- 
izable wrt the queue sequential specification Sq if and only if 

1. h js locally linearizable wrt the pool sequential specification Sp (after suitable renaming 
of method calls), and 

2. 'ix,y £ V. VT. enq{x) <() e.nq{y) A deq{y) G h ^ deq{x) G h A deq{y) -fth 

deq{x). ◄ 



